Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform reachability analysis on a per-harness basis #2439

Merged
merged 8 commits into from
May 16, 2023

Conversation

celinval
Copy link
Contributor

@celinval celinval commented May 12, 2023

Description of changes:

Kani compiler used to generate one goto-program for all harnesses in one crate. In some cases, this actually had a negative impact on the harness verification time. This was first reported in #1659, and it is now blocking the toolchain upgrade from #2406.

The main changes were done in the compiler's module compiler_interface and the module project from the driver. The compiler will now gather all the harnesses beforehand and it will perform reachability + codegen steps for each harness. All files related to a harness goto-program will follow the naming convention bellow:

<BASE_NAME>_<MANGLED_NAME>.<EXTENSION>

This applies to symtab / goto / type_map / restriction files.

The metadata file is still generated once per target crate, and its name is still the same (<BASE_NAME>.kani-metadata.json).

On the driver side, the way we process the artifacts have changed. The driver will now read the metadata for each crate, and collect all artifacts based on the symtab goto file that is recorded in the metadata of each harness.

These changes do not apply for --function. We still keep all artifacts based on the crate's <BASE_NAME> and we have a separate logic to handle that. Fixing this is captured by #2129.

Resolved issues:

Resolves #1855

Related RFC:

Call-outs:

There are a few of TODOs in this code that I left to avoid making this PR even bigger. I'll try to cross them out in the next couple of weeks. The only one that I think is somewhat urgent is the one in the compiler_interface.rs about making sure the test description and mono item align. Although this logic works today, it is rather fragile.

Unfortunately, this PR will impact compilation time. We could try to optimize this logic, but any optimization I can think of will make this PR much bigger. Also, the best optimization might just be improving our goto codegen performance in general.

Testing:

  • How is this change tested?

  • Is this a refactor change? Yes

Checklist

  • Each commit message has a non-empty body, explaining why the change was made
  • Methods or procedures are documented
  • Regression or unit tests are included, or existing tests cover the modified code
  • My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

@celinval celinval marked this pull request as ready for review May 12, 2023 20:13
@celinval celinval requested a review from a team as a code owner May 12, 2023 20:13
@celinval
Copy link
Contributor Author

I still need to add the test I've been working on, but the code is ready for review.

@celinval
Copy link
Contributor Author

I still need to add the test I've been working on, but the code is ready for review.

Done!

Conflicts:
    kani-compiler/src/codegen_cprover_gotoc/compiler_interface.rs
Copy link
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an initial set of comments. I'm having difficulty wrapping my head around how project, target crate, harness, etc are structured or related.

The driver will now read the metadata for each crate, and collect all artifacts based on the symtab goto file that is recorded in the metadata of each harness.

Does each crate metadata file contain metadata for each harness in that crate?

tests/expected/per-harness/drop.rs Show resolved Hide resolved
kani-driver/src/call_cbmc_viewer.rs Show resolved Hide resolved
kani-driver/src/session.rs Show resolved Hide resolved
tests/expected/per-harness/drop.rs Show resolved Hide resolved
kani-driver/src/project.rs Outdated Show resolved Hide resolved
@celinval
Copy link
Contributor Author

Here's an initial set of comments. I'm having difficulty wrapping my head around how project, target crate, harness, etc are structured or related.

The driver will now read the metadata for each crate, and collect all artifacts based on the symtab goto file that is recorded in the metadata of each harness.

Does each crate metadata file contain metadata for each harness in that crate?

Yes.That actually didn't change in this PR at all. It was a previous change.

Each crate has one metadata. Each crate metadata has a vector with harnesses metadata. Each harness metadata contains the name of a goto file.

Before this change, the name of the goto file was redundant since all harnesses metadata pointed to the same goto file. With this change, we extract the goto file for each harness from the metadata.

@celinval
Copy link
Contributor Author

celinval commented May 14, 2023

Btw, the crate metadata is defined here:

pub struct KaniMetadata {
/// The crate name from which this metadata was extracted.
pub crate_name: String,
/// The proof harnesses (`#[kani::proof]`) found in this crate.
pub proof_harnesses: Vec<HarnessMetadata>,
/// The features found in this crate that Kani does not support.
/// (These general translate to `assert(false)` so we can still attempt verification.)
pub unsupported_features: Vec<UnsupportedFeature>,
/// If crates are built in test-mode, then test harnesses will be recorded here.
pub test_harnesses: Vec<HarnessMetadata>,
}

It holds metadata for proof and test harnesses. In each harness metadata, we store the path to the model here:

/// Optional modeling file that was generated by the compiler that includes this harness.
pub goto_file: Option<PathBuf>,

That said, I was hoping we could change that field to be a vector of artifacts instead. It would simplify the driver and also be engine agnostic.

Copy link
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @celinval! Can we evaluate the impact of this change on the compilation time of a big project (e.g. s2n-quic)? The verification time numbers look great, but I wanted to make sure compilation time is not severely impacted.

kani-driver/src/call_cbmc_viewer.rs Show resolved Hide resolved
@celinval
Copy link
Contributor Author

celinval commented May 15, 2023

Thanks @celinval! Can we evaluate the impact of this change on the compilation time of a big project (e.g. s2n-quic)? The verification time numbers look great, but I wanted to make sure compilation time is not severely impacted.

Totally! That's exactly what I've been doing now. I noticed that the "Kani CI / perf" job finished rather quickly, so I'm hoping the overall time is still better. That said, the compilation time does become significant and it increases the need for optimizing the compiler.

Unrelated to this change, I've also noticed that Kani driver is starting to consume a lot of memory when verifying the s2n-quic-core crate.

@celinval
Copy link
Contributor Author

celinval commented May 15, 2023

I manually executed s2n-quic to get the overall execution time and compilation times using a release build. I compared the head of this PR (fd0075c) and main (7c4400d):

Main PR
compilation time 21s 35s
overall time 12min 30s 8min 24s

I think overall this change is still fairly beneficial for our users. I suggest that we prioritize compiler performance improvements to a follow up PR. I created an issue for adding more data to benchcomp and to investigate optimizations.

@zhassan-aws
Copy link
Contributor

Nice! Ship it!

@celinval celinval enabled auto-merge (squash) May 16, 2023 00:14
@celinval celinval merged commit 2a09d79 into model-checking:main May 16, 2023
celinval added a commit that referenced this pull request May 17, 2023
This fixes a regression introduced in #2439 when write symtab json is enabled. We still need to take that into consideration and remove them if needed.

This change also simplifies the write symtab json regression to avoid the out of disk space issue we've been seeing since #2439.
celinval added a commit that referenced this pull request May 19, 2023
This fixes a regression from #2439. The compiler should store the location of the function body instead of the declaration. Storing the correct location fixes how concrete playback stores the generated unit test.
celinval added a commit that referenced this pull request Jun 22, 2023
This was a regression introduced by #2439. We were still writing the result of the reachability algorithm to the same file for every harness. Thus, we could only see the MIR for the last harness that was processed.

Use the file name that is specific for the harness instead and generate one MIR file per harness like we do with other files generated by kani-compiler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Per-harness code generation
2 participants